# Sub-threshold Standard Cell Library Design for Ultra-Low Power Biomedical Applications

Ming-Zhong Li, Chio-In Ieong, Man-Kay Law, *Member, IEEE*, Pui-In Mak, *Senior Member, IEEE*, and Mang-I Vai, *Senior Member, IEEE*, Rui P. Martins, *Fellow, IEEE* 

Abstract—Portable/Implantable biomedical applications usually exhibit stringent power budgets for prolonging battery life time, but loose operating frequency requirements due to small bio-signal bandwidths, typically below a few kHz. The use of sub-threshold digital circuits is ideal in such scenario to achieve optimized power/speed tradeoffs. This paper discusses the design of a sub-threshold standard cell library using a standard 0.18-µm CMOS technology. A complete library of 56 standard cells is designed and the methodology is ensured through schematic design, transistor width scaling and layout design, as well as timing, power and functionality characterization. Performance comparison between our sub-threshold standard cell library and a commercial standard cell library using a 5-stage ring oscillator and an ECG designated FIR filter is performed. Simulation results show that our library achieves a total power saving of  $95.62\,\%$  and a leakage power reduction of 97.54% when compared with the same design implemented by the commercial standard cell library (SCL).

#### I. INTRODUCTION

Power consumption is critical for biomedical applications. As an example, some implanted systems e.g. cardiac pace-makers, are often powered by a non-rechargeable battery, while battery life is very important for user experience [1]. Moreover, the operating frequencies of biomedical applications are usually low since the bio-signal frequency bandwidths are typically below a few kHz. This necessitates a methodology to design a custom SCL suitably optimized for biomedical devices to achieve better power performance [2-4].

SCL is a set of fundamental circuits for digital VLSI design, which contains circuits including Boolean logic functions (e.g. AND, OR, NOT), storage functions (e.g. flip-flop), physical design assistances (e.g. Buffer, TieHi, TieLo, EndCap). SCL-based digital VLSI design is achieved by programming and connecting the standard cells for the required topologies. The performance of SCL is critical as it has direct impact on the performance of synthesized digital VLSI. Normally digital IC designers acquire the SCL from

Research financially supported by the Macao Science and Technology Development Fund (024/2009/A1 & 015/2012/A1) and Research Committee of University of Macau (UL006A/10-Y3/EEE/VMI/FST, UL006B/10-Y3/EEE/VMI/FST & MYRG115-FST12-LMK).

Ming-Zhong Li, Chio-In Ieong and M. I Vai are with the Biomedical Engineering Laboratory, FST and State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China.

Man-Kay Law, Pui-In Mak and Rui P. Martins are with the State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China.

 $R.\ P.\ Martins$  is on leave from Instituto Superior Técnico (IST) / TU of Lisbon, Portugal.

foundry or Electronic Design Automation (EDA) companies that contains pre-designed standard cells with the standard operating voltage and typical performance for fast prototyping of main stream applications.

While developing a biomedical electronic device that with high operating frequency is of secondary concern, digital circuits operating in the *sub-threshold* region, where the transistors are sometimes regarded as being "turned off", can achieve significant power savings. Nevertheless, a detailed design flow of the creation of a sub-threshold standard cell library is rarely published. This paper discusses the design considerations, design flow and verification of a sub-threshold SCL with significant reduction in power consumption compared to a commercial SCL.

In the following sections of this paper, the design methodology of developing a complete sub-threshold standard cell library that containing 56 sub-cells using a standard 0.18-µm CMOS technology with a nominal supply voltage of 1.8V and the threshold voltages are  $V_{tn}$  = 0.482V and  $V_{tp}$  = 0.462V, respectively, is presented. In section II, a brief overview of the sub-threshold operations will be provided. Section III outlines the details of developing a standard cell library including transistor sizing, layout and abstract design considerations and characterization of each cell working in the sub-threshold region. Section IV compares our sub-threshold SCL with a commercial SCL using both a 5-stage ring oscillator and a 12-bit FIR filter designated for ECG detection is synthesized using our sub-threshold SCL targeting on ECG application according to [5]. Finally, the concluding remarks are given in section VI.

## II. BASIC PRINCIPALS OF SUB-THRESHOLD OPERATION

Traditionally CMOS is considered to be "turned off" when the gate-source voltage ( $V_{GS}$ ) is lower than the threshold voltage ( $V_{TH}$ ), and the associated drain-source current ( $I_{DS}$ ) is viewed as the leakage current. Theoretically, sub-threshold operation is the most power-efficient regime of operation in a transistor [1].

## A. MOSFET in Sub-threshold Region

For a transistor operating in the sub-threshold region, the drain current [2] is given by

$$I_{sub} = I_0 e^{\frac{V_{GS} - V_T + \eta V_{DS}}{nV_{th}}} (1 - e^{\frac{-V_{DS}}{V_{th}}})$$
(1)  
$$I_0 = \mu_0 C_{ox} \frac{W}{L} (n - 1) V_{th}^2$$
(2)

where  $\mu$  is the mobility,  $C_{ox}$  is the oxide capacitance, n is the sub-threshold slope factor,  $V_{th}$  the thermal voltage, and  $\eta$  the

DIBL coefficient,  $V_{GS}$  the gate to source voltage and  $V_{DS}$  is the drain to source voltage,  $V_T$  the transistor threshold voltage,  $I_o$ the nominal current. From equation (1) and (2),  $I_{sub}$  is exponentially varied with the overdrive voltage  $V_{GS}$  -  $V_T$ , and the resultant  $I_{sub}$  will reduce exponentially with decreasing  $V_{DD}$ , which is lower than the  $V_T$  of a transistor.

#### B. Energy Efficiency for Sub-threshold Operation

The concept of exploring an optimal operating voltage has been proposed that the minimum energy point typically occurs in sub-threshold region [6]. The total energy of an arbitrary circuit can be broken down into switching energy and leakage energy. Switching energy, leakage energy and total energy are modeled as

$$E_{\text{switch}} = C_{\text{eff}} V_{\text{DD}}^2 \tag{3}$$

$$E_{leakage} = W_{eff}I_{leakage}V_{DD}t_{d}L_{DP}$$
 (4)

$$\begin{split} E_{switch} &= C_{eff} V_{DD}^2 & (3) \\ E_{leakage} &= W_{eff} I_{leakage} V_{DD} t_d L_{DP} & (4) \\ E_{Total} &= C_{eff} V_{DD}^2 + W_{eff} I_{leakage} V_{DD} t_d L_{DP} & (5) \end{split}$$

where  $C_{\it eff}$  and  $W_{\it eff}$  represent the average total switching capacitance and normalized width contributing to leakage current,  $L_{DP}$  the logic depth in terms of the inverter delay, while  $t_d$  and  $I_{leakage}$  are the delay and leakage current of a characteristic inverter.

Equation (3) and (4) shows that switching energy will decrease quadratically with the supply voltage while leakage energy will be declining proportionally to the supply voltage decrease. It can be shown that the energy consumption for a transistor operating in the sub-threshold region has a significant improvement in energy consumption when it is operating in the strong inversion region. Based on the 0.18-µm CMOS process and equation (3), if we operate the logic with a sub-threshold supply voltage equals to 1/4 of the nominal supply voltage, a first-order estimate shows that a dynamic power saving can be as large as 93.75%, which is significant and is beneficial to biomedical devices.

## III. STANDARD CELL LIBRARY DESIGN

A standard cell library is a collection of basic building blocks providing Boolean logic functions (e.g. AND, OR, NOT), storage functions (e.g. flip-flop) and physical design assistances (e.g. Buffer, TieHi, TieLo, EndCap). Our SCL development flow is shown in Figure 1. Initially, the standard cell dimension is scaled to meet the design requirement with simulation at global corners passed, followed by the layout drawing with no any DRC and LVS violations encountered.

Afterwards, cell characterization will be performed to extract timing, power and functionality information that are included in the liberty format (.lib) file of the library. Meanwhile, the abstract view of a cell will be generated from the corresponding layout, and the library exchange format (.lef) file is created. Both the liberty format and exchange format files will be imported into the place and route tool for design optimization and design layout automatic generation.

#### A. Transistor Dimension Scaling

This section describes the methodology of scaling a sub-threshold standard cell. According to equation (5), the smaller the transistor dimensions, the smaller the effective



Figure 1. SCL development flowchart

TABLE I. INVERTER CHAIN POWER CONSUMPTION VS.  $V_{DD}$ 

| P/N<br>ratio | Power supply voltage |       |       |       |      |       |
|--------------|----------------------|-------|-------|-------|------|-------|
| Tatio        | 0.2V                 | 0.25V | 0.3V  | 0.35V | 0.4V | 0.45V |
| 1:1          | 5pW                  | 6.7pW | 8.4pW | 10pW  | 12pW | 15pW  |
| 2:1          | 6pW                  | 8.0pW | 10pW  | 12pW  | 15pW | 18pW  |
| 4:1          | 7pW                  | 9.3pW | 12pW  | 14pW  | 17pW | 20pW  |

capacitance  $C_{eff}$  and width  $W_{eff}$ , and hence the smaller the total energy consumed. It is expected that minimum transistor sizes present minimum energy in sub-threshold. The propagation delay  $t_d$  of a characteristic inverter is positively correlated with the gate capacitance  $C_g$ , which is also regarded as the switch capacitance as

$$t_d \propto C_g V_{DD} \tag{6}$$
$$C_q \propto C_L \tag{7}$$

$$C_q \propto C_L$$
 (7)

where  $C_L$  is the load capacitance. The smaller the size of a transistor enables the smaller delay  $t_d$  in the sub-threshold region. Then, without taking process variation and mismatch into account, the minimum MOFET dimension, which is 0.22-µm width W by 0.18-µm length L in our CMOS technology, yields the minimum energy consumption and propagation delay.

Based on the assumption mentioned above, transient simulations of a 7-stage inverter chain clocked at 250MHz are performed at global corners with fan-out of 4. Table I shows that a 1:1 ratio for PMOS and NMOS width results in minimum power consumption at different supply voltages.

However, it is expected that the influence of process variation and  $V_T$  mismatch will be significant for minimum-sized transistors. As a result, global upsizing of a transistor while keeping a 1:1 ratio for the widths of PMOS and NMOS is exercised to minimize the process effect and power consumption. Then based on the dimensions scaled of a basic inverter INVX1, methodology of logical efforts [7] is adopted to determine the rest of the basic cells, such as



Figure 2. Logical efforts of INVX1, NAND2X1 and NOR2X1 NAND2X1 and NOR2X1 gates that the stack PMOS and



Figure 3. Cell layout design rules

stack NMOS are necessary to be upsized to ensure the similar pull-up and pull-down driving capabilities as shown in Figure 2. When compared the sub-threshold library with the commercial one, Table II shows the propagation delay of the cells INVX1, NAND2X1 and NOR2X1, indicating that the sub-threshold standard cells operate faster than the nominal ones. Based on the basic cells mentioned above that the rest of the complicated gates such as AND, OR, OR-AND-INVERT and AND-OR-INVERT can be determined. For cells having different drive strengths, transistors are scaled to obtain a comparable propagation delay with the basic cell that belongs to the same footprint.

## B. Layout Design Considerations

After the transistor structure functions as expected, the physical design of the cells can then be created manually. Our employed place and route tool places the cells in sets of rows and wiring between the standard cells is performed by the router following the vertical and horizontal grids. Some design rules as listed below and diagrammatically illustrated in Figure 3 should be strictly followed during the cell layout.

- All the I/O pins (except power supply pins) must be placed on the intersections of the vertical and horizontal grids
- Cell height must be a multiple of the horizontal grid and cell width be a multiple of the vertical grid space
- Use bottom metal (usually metal 1) to draw the layout
- Filler cells should be included in the library to guarantee the continuity of supply rails and n-well

Verifications are performed to ensure that each cell in the library passes Design Rule Check (DRC) and Layout Versus Schematic (LVS) without any violations occurred.



Figure 4. Library characterization process

TABLE II. PROPAGATION DELAY COMPARISON

|           |         | Fan-out of 4 loading with 0.45V supply voltage |            |             |  |
|-----------|---------|------------------------------------------------|------------|-------------|--|
|           |         | INVX1, ns                                      | NOR2X1, ns | NAND2X1, ns |  |
| Sub. Lib. | Pull-up | 10.322                                         | 18.612     | 10.057      |  |
| Com. Lib. |         | 11.3713                                        | 25.8616    | 10.0826     |  |
| Sub. Lib. | Pull-   | 9.533                                          | 9.236      | 13.8746     |  |
| Com. Lib. | down    | 10.1206                                        | 10.0286    | 16.792      |  |

#### C. Standard Cell Characterization

Upon completion of the physical implementation, cell characterization is performed to generate timing models of the library that are used for synthesizing behavior codes of a design and also the timing optimization during the place and route step. The library characterization process of a standard cell library is highlighted in Figure 4. The main inputs are:

- A SPICE-format netlist that contains the detailed transistor devices, resistance and capacitance for each cell.
- A setup file that have all the information about simulation condition (e.g. corners, voltage, temperature, input slew, output loading).
- A SPICE model file provided by foundry.

These files are passed to the Encounter Library Characterizer to generate a library database that contains timing, power and logic function for each of the cells, namely, the liberty format (.lib) for synthesis, placement and routing.

## D. Abstract View Creation

An abstract view is generated by Cadence Abstract Generator containing basic information like cell boundary, pin names, locations, metal layer and overall size of blockages. Such information is exported as a Library Exchange Format (.lef) file along with the technology information such as routing grid, via size and coordinate, metal blockage coordinates and pin locations etc. After the placement and routing is performed, the abstract cell view can be replaced by the corresponding physical layout.

## IV. LIBRARY SIMULATION AND VALIDATION

A complete sub-threshold SCL with 56 sub-cells is created using the procedures outlined in Section III. In order to characterize the performance of the library, a 5-stage inverter-based ring oscillator operating at  $V_{DD} < V_T$  are simulated, and its performance (e.g. frequency, power) is compared with a commercial library. Identical constraints are



Figure 5. Area/Power ratio vs. supply voltage

TABLE III. RING OSCILLATOR PERFORMANCE SUMMARY

| Supply<br>voltage | Frequer   | icy (kHz) | Power Consumption (nW) |           |  |
|-------------------|-----------|-----------|------------------------|-----------|--|
|                   | Sub. Lib. | Com. Lib. | Sub. Lib.              | Com. Lib. |  |
| 0.3V              | 2.607     | 1.855     | 0.2336                 | 0.4299    |  |
| 0.35V             | 8.139     | 5.577     | 0.5835                 | 1.0675    |  |
| 0.4V              | 22.941    | 16.261    | 1.4187                 | 2.5763    |  |
| 0.45V             | 56.807    | 41.334    | 3.3544                 | 6.05952   |  |

TABLE IV. No. of Buffers Adopted from Different Liberty Files

| Buffer        | 0.3V | 0.45V | 0.6V | 1.2V | 1.8V |
|---------------|------|-------|------|------|------|
| Inverting     | 2756 | 663   | 1870 | 337  | 337  |
| Non-Inverting | 2782 | 2460  | 2292 | 2    | 2    |
| Total         | 5538 | 3123  | 4162 | 339  | 339  |

enforced in both implementations. The results are summarized in Table III. It can be observed that the sub-threshold SCL can achieve an even higher operating frequency with even lower power consumption under an identical power supply voltage from 0.3 V to 0.45 V. This is mainly due to the reduction in the capacitive loading as a result of sizing optimization.

To demonstrate the feasibility of our sub-threshold SCL, a 12-bit FIR filter designated for ECG signal transformation is implemented based on [5]. While keeping the identical design constraints, Figure 5 shows the simulation result of synthesizing the FIR filter by using different liberty files that generated in different voltages. It can be observed that an optimal point exists in the sub-threshold region, indicating that a significant power reduction can be achieved if the system is operating in the sub-threshold region. The drawback, however, is the increase in a certain amount of buffers as shown in TABLE IV to compensate the system operating speed and hence the increase of corresponding chip area induced. Among the sub-threshold regime, the design trade-off between area and power consumption is presented and an optimal operating condition is determined.

On the other hand, the detail report is shown in Table V under the assumption of activity factor AF = 1, comparing the sub-threshold library with the commercial library. Similar to the case of the 5-stage ring oscillator, the sub-threshold library based filter achieves a power saving of 95.62% and a leakage power reduction of 97.54% when compared with the same design implemented by the commercial one. The major drawback is the increase in the number of cells of about 1.76 times, resulting in an increase in area. The schematic simulated waveforms are shown in Figure 6. Further study is under-going for the optimization of the SCL.

TABLE V. FIR FILTER PERFORMANCE SUMMARY WITH AF=1

| Instance           | No. of Cells | Leakage<br>Power, nW | Switching<br>Power, uW | Total<br>Power, uW |
|--------------------|--------------|----------------------|------------------------|--------------------|
| FIR<br>(Com. Lib.) | 3005         | 1104.383             | 2020.7918              | 2021.8962          |
| FIR<br>(Sub. Lib.) | 5298         | 27.184               | 88.5499                | 88.577             |



Figure 6. Schematic simulation of FIR filter designated for ECG detection

## V. CONCLUSION

Ultra-low power operation is compelling for biomedical applications. In this work, a verified sub-threshold standard cell library design flow is presented along with a methodology of scaling transistor dimensions for operating in the sub-threshold region. Totally 56 standard cells based on a standard 0.18-um CMOS technology are created according to the design flow that involves layout design and validation of each standard cell, as well as the characterization for timing, power and functionality. A ring oscillator based on our sub-threshold SCL can achieve an even higher operating frequency with even lower power consumption under an identical power supply voltage of 0.3V to 0.45V. Similar to the case of the 5-stage ring oscillator, the sub-threshold library based FIR filter design achieves a power savings of 95.62% and a leakage power reduction of 97.54% comparing with the same design implemented by the commercial SCL.

#### ACKNOWLEDGMENT

The authors would like to thank Miss Meng-Yu Yan, Mr. Yang Jiang, Mr. Cheng Dong and Mr. Tan-Tan Zhang for the valuable discussions.

#### REFERENCES

- Rahul Sarpeshkar, Ultra Low Power Bioelectronics: Fundamentals, Biomedical Applications, and Bio-Inspired Systems, Cambridge University Press, 1st edition, pp.5-8, 2010.
- [2] Bo Liu, Ashouei, M., Huisken, J., de Gyvez, J.P., "Standard cell sizing for subthreshold operation," 49th ACM/EDAC/IEEE Design Automation Conference (DAC), pp.962-967, June 2012.
- [3] S. Amarchinta, H. Kanithar, D. Kudithipudi, "Robust and High Performance Subthreshold Standard Cell Design", 52<sup>nd</sup> IEEE International Midwest Symposium on Circuits and Systems, pp. 1183-1186, Aug. 2009.
- [4] J. Kwong and A. Chandrakasan, "Variation-driven Device Sizing for Minimum Energy Sub-threshold Circuits," Int. Symp. Low Power Electronics and Design, pp. 8-13, 2006.
- [5] Ieong, C.-I., Mak, P.-I., Lam, C.-P., Dong, C., Vai, M.-I., Mak, P.-U., Pun, S.-H., Wan, F., Martins, R. P., "A 0.83-µW QRS Detection Processor Using Quadratic Spline Wavelet Transform for Wireless ECG Acquisition in 0.35-µm CMOS," IEEE Trans. on Biomedical Circuits and Systems, vol.6, no.6, pp.586-595, Dec. 2012.
- 6] Wang, A., Chandrakasan, A., "A 180-mV subthreshold FFT processor using a minimum energy design methodology," IEEE Journal of Solid-State Circuits, vol.40, no.1, pp. 310-319, Jan. 2005.
- [7] E. Sutherland, R. F. Sproull, D. Harris, Logical Effort: Designing Fast CMOS Circuits, Morgan Kaufmann, 1999.